Develop#296
Merged
Merged
Conversation
…erlap check
Two hot-path improvements in the variant-generation phase, identified by a
post-v4.4.3 cProfile run on ecoli 2x SE (single-thread baseline 158.6 s).
1. ContigVariants.check_if_del / check_if_ins — replace O(N) linear scans
over `all_dels` / `all_ins` with O(1) lookups into per-position bucket
dicts. Each indel is indexed at every reference position it covers when
added; the total memory cost is O(sum of indel lengths), single-digit
MB at human scale. Profile on 14,052 calls:
check_if_del: 3.624 s -> 0.011 s (~325x)
check_if_ins: 3.700 s -> 0.007 s (~530x)
The previous representation was O(N**2) over a single variant-generation
pass and would scale poorly on larger genomes.
2. variant_models.map_local_trinuc_bias — replace 64 regex scans per call
(one re.finditer per trinucleotide string) with a single vectorized
numpy expression that encodes each base 0-3, builds a trinuc-index
array via `left*16 + mid*4 + right`, and looks up biases in one table
read. The cache check is also tightened from `==` (O(N) byte compare)
to `is` (O(1) object identity) since slices are different objects
anyway. Profile on 14,100 calls:
map_local_trinuc_bias: 13.491 s -> 0.811 s (~16x)
Combined effect on the single-thread ecoli 2x SE cProfile:
Total wall: 158.6 s -> 135.3 s (-15%, -23.3 s)
generate_variants cumtime: 24.9 s -> 4.2 s (~5.9x)
Variant generation is no longer in the top-5 hot path.
Tests: all 599 pass. The `import re` in variant_models.py is no longer used
and has been removed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
write_bam_record was the single biggest hot path after the v4.4.3 variant
work — 28 M Seq.__getitem__ calls per 185 k-read run, plus a per-base list
comp for the quality string and 9 small struct.pack calls per record.
Replace with bulk numpy / single struct.pack operations:
- Sequence encoding: replace the per-byte Python loop
for i in range(encoded_len):
encoded_seq.extend(pack('<B',
(SEQ_PACKED[alt_sequence[2*i]] << 4) +
SEQ_PACKED[alt_sequence[2*i+1]]))
with a vectorized lookup through a 256-entry table:
codes = SEQ_PACKED_LOOKUP[np.frombuffer(seq_str.encode('ascii'), ...)]
encoded_seq = ((codes[0::2] << 4) | codes[1::2]).tobytes()
Eliminates the 28 M Biopython Seq.__getitem__ calls that dominated cumtime.
- Quality encoding: replace
encoded_qual = "".join([chr(n) for n in read.quality_array]).encode(...)
with one `np.asarray(quality_array, dtype=np.uint8).tobytes()` call.
- CIGAR ops: pack all in a single `struct.pack(f'<{n}I', ...)` instead of
one pack per op + bytearray.extend. Also drop the regex CIGAR re-parse in
favour of a single linear scan over the CIGAR string (no regex import
needed; `re` removed).
- BAM record header: one `struct.pack('<iiiIIiiii', ...)` for the nine
fixed-size fields instead of nine separate pack calls glued with `+`.
Effect (clean machine, ecoli 10x 4-thread untimed):
SE wall: 1:35.3 (v4.4.3) -> 1:07.1 (1.42x faster, -30%)
PE wall: 1:34.5 (v4.4.3) -> 1:05.9 (1.43x faster, -30%)
SE CPU%: 348 -> 364 PE CPU%: 359 -> 378 (workers near-saturated)
Single-thread cProfile of write_bam_record itself:
self time: 12.0 s -> 3.2 s (3.75x)
cumtime: 58.1 s -> 8.1 s (7.2x)
Combined with the v4.4.4 variant-generation fixes (committed separately) and
the v4.4.3 architectural work, NEAT is now 13.3-13.4x faster than v4.4.2 on
the ecoli benchmark, with strictly coordinate-sorted output and bounded
per-worker memory.
All 599 tests pass. Bumps version to 4.4.4 for the release.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Records the three hot-path fixes in this release with before/after numbers: - ContigVariants overlap-check O(1) via position-bucket dicts - map_local_trinuc_bias vectorization (16x) - write_bam_record vectorization (7.2x cumtime) Total release effect on ecoli 10x 4-thread: SE 1:35 -> 1:07 (1.42x) PE 1:34 -> 1:06 (1.43x) 13.3-13.4x faster than v4.4.2 cumulative. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Low-risk audit items applied across the README:
- Prerequisites: poetry pinned to >= 2.0 (was 1.3.*, no longer accurate against
pyproject.toml's poetry-core>=2.0 requirement). Added version constraints
for numpy (>=2.2) and pysam (>=0.23) to match pyproject.
- Removed the OS-support contradiction. The Prerequisites section already
notes "NEAT assumes a Linux environment ... install WSL on other OSes"; the
installation section now agrees ("Linux tested; macOS may work; Windows via
WSL") instead of contradicting it with the old "supports MacOS, Windows,
and Linux" claim.
- Usage table: corrected `mode`/`size` to the actual config keys
`parallel_mode`/`parallel_block_size`. Updated descriptions for the new
defaults (`parallel_mode: size` for threads>1, forced to `contig` for
threads==1; `parallel_block_size: 0` = auto-tune).
- Estimated-runtimes banner: updated to point at v4.4.4 numbers (~13× faster
than v4.4.2 on ecoli) and to reference both v4.4.3 and v4.4.4 ChangeLog
entries.
- Estimated-runtimes config snippets: same `mode`/`size` → `parallel_mode`/
`parallel_block_size` correction.
- "Insert specific variants" example: dropped the `-v`/`-M` CLI flag prose
(those flags were removed when NEAT switched to config-file-only).
- Three remaining Examples sections (Insert variants, Single-end, Large
single-end): matched the fence formatting pattern (separate yml + bash
blocks, "Contents of neat_config.yml:" label) that was already applied to
the first two examples.
- Long-read caveat: reworded "not yet implemented in NEAT 4.0" to "planned
for a separate future release; not yet fully validated in NEAT 4.4".
- Removed the `neat vcf_compare` section entirely. The subcommand does not
exist (confirmed via `neat --help`) and the section was documenting an
unimplemented interface.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…om threads
`parallel_mode` was a YAML option with two values, `'size'` and `'contig'`,
which never actually let the user change behavior:
- When `threads == 1`, Options force-set it to `'contig'` regardless of
user input.
- When `threads > 1`, `'size'` was the default; `'contig'` would have
yielded one worker per reference contig, but `chunk_record` in
split_inputs already produces a single chunk for contigs ≤ chunk size,
so `'size'` is a strict behavioral superset.
Removing it cleans up dead config and eliminates a footgun (a user setting
`parallel_mode: contig` with `threads: 8` thinking they'd get one-worker-
per-contig would in fact just disable sub-contig splitting on large
contigs and silently reduce parallelism).
Changes:
- options.py: drop `parallel_mode` from the Options __init__ signature,
schema, and stored state. Replace the post-parse log block with a
threads-keyed one. Add a `DEPRECATED_KEYS` table and a deprecation
warning in `read_yaml` so old configs with `parallel_mode: ...` keep
parsing (the key is silently ignored after a one-line warning).
- split_inputs.py: branch on `options.threads == 1` instead of
`options.parallel_mode == "contig"` for one-chunk-per-contig output.
- runner.py: simplify the auto-tune gate; drop the
`options.parallel_mode == "size"` predicate (always true when
`threads > 1`).
- config_template/template_neat_config.yml,
config_template/simple_template.yml: drop the `parallel_mode: .` line
and tighten the comment for `parallel_block_size`.
- README.md: drop the `parallel_mode` table row; clean up references in
the Estimated runtimes section and HPC examples; tighten the
parallel_block_size description.
- tests/test_models/test_split_inputs.py: the `mode=` parameter in the
test helper now maps to `opts.threads` (1 for contig-style, 4 for
size-style) — callsites unchanged.
- tests/test_read_simulator/test_options.py:
- drop `parallel_mode` lines from the embedded YAML samples
- drop the `assert opts.parallel_mode == "contig"` default-values
assertion
- swap the `parallel_mode` example in the choice-validator tests for a
synthetic key (`_test_choice_key`) since no production option uses
the `choice` validator anymore
- replace `test_log_configuration_threads_one_forces_contig` (which
was checking a force-override that no longer exists) with
`test_read_yaml_deprecated_parallel_mode_warns` (verifies the
deprecation warning + silent-ignore path works end-to-end via
`Options.from_cli`)
End-to-end check: an old YAML with `parallel_mode: size` still runs to
completion and emits a single warning line:
WARNING: `parallel_mode` is deprecated and ignored; splitting strategy
is now derived from `threads`.
All 599 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The removal is a YAML-schema cleanup; existing configs keep working via a deprecation warning path. Behavior is unchanged — parallel_mode never actually let users change anything under the old logic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-chunk read counts in cover_dataset were scaled by chunk_mean/max_weight when a non-uniform gc_model was configured, making `coverage` the peak coverage at the GC bias maximum rather than the documented genome-wide average. The fix precomputes the genome-wide mean weight once at the runner level and divides per-chunk reads by that, so the chunk-scaling ratio averages to 1.0 and total reads match coverage * genome_length / read_len. - New compute_genome_wide_gc_mean_weight helper (vectorized, reuses the same prefix-sum windowing as cover_dataset; uniform models short-circuit). - Runner stores the result on options.gc_global_mean_weight so workers don't redundantly scan the reference. - cover_dataset falls back to chunk-local mean (no rescaling) when the field is unset, preserving single-chunk unit-test behavior. Drive-by: drop dead MarkovQualityModel stub and stale TODO in test_contig_variants.py (the remove_variant fix landed separately). ChangeLog rewritten as a single v4.5.0 entry covering the perf work already shipped in v4.4.4, the parallel_mode removal, and this fix. All 603 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five comments were either factually wrong, idle speculation, or dead work-in-progress with no path back into the code: - model_sequencing_error/utils.py: drop "TODO implement plotting" — plt is in fact used heavily later in the file. Remove the commented-out shape_curves.append() + the 8-line shape_curves DataFrame scaffolding; both referenced pandas/seaborn that aren't imported and have sat dead for years. - model_fragment_lengths/utils.py: drop speculative docstring TODO about filter_lengths "maybe" being useful elsewhere. - common/ploid_functions.py: drop "TODO may need to improve the modeling for polyploid, maybe" — pure musing, not actionable as written. No behavior change; 603 tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both lists were built up in gen_mut_model/runner.py but never read or returned anywhere in the codebase — scaffolding for two features that were never wired into the saved mutation model: - common_variants: 95th-percentile-frequency variants from the input VCF. - high_mut_regions: variant-density-clustered hotspots, top 3% by density, with overlapping regions merged. The design intent (pipe both into the saved mutation model so the simulator can bias generation accordingly) is now captured in #295. If the work is revived, the cluster_list and find_caf helpers are still available in gen_mut_model/utils.py. Collateral cleanup: drop the now-unused vcf_common dict and the cluster_list / find_caf imports from runner.py. No behavior change in the produced model; 603 tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The release that adds the GC bias coverage fix, the follow-up perf pass, and the parallel_mode removal will be tagged 4.4.4 instead of 4.5.0. The GC bias fix does change output bytes when a non-uniform gc_model is configured, but the user-facing config has always documented `coverage` as the genome-wide average — the prior behavior was the bug. Shipping as a patch release is consistent with framing this as a fix rather than a deliberate behavior change. Updates: - pyproject.toml: 4.5.0 -> 4.4.4 - ChangeLog.md: rename v4.5.0 entry -> v4.4.4; benchmark column relabeled; drop the "reason for the minor version bump" sentence and replace it with concrete coverage-shortfall numbers (~70 % delivered pre-fix). - README v4.4.4 references are already correct; no edit needed. 603 tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Further refinements and enhancements related to processing time and memory. Release 4.4.4.